What is Cross-Sell?
Cross-selling in insurance is the act of promoting products that are related or complementary to the one(s) your current customers already own or use. It is one of the most effective methods of marketing.
Client Profile:
An insurance company that provides medical insurance to its customers wants to know how many of their existing policyholders (customers) from last year will also be interested in Vehicle Insurance provided by the company.
What is an Insurance Policy? An insurance policy is an arrangement by which a company undertakes to provide a guarantee of compensation for specified loss, damage, illness, or death in return for the payment of a specified premium. A premium is a sum of money that the customer needs to pay regularly to an insurance company for this guarantee.
What is Vehicle Insurance? Vehicle insurance is insurance for cars, trucks, motorcycles, and other road vehicles where every year customer needs to pay a premium of certain amount to insurance provider so that they provide financial protection against physical damage or bodily injury resulting from traffic collisions and against liability that could also arise from incidents in a vehicle.
Whether a customer would be interested in an additional insurance service like vehicle Insurance is extremely helpful for the company because it can then accordingly plan its communication strategy to reach out to those customers and optimize its business model and revenue. We have following information to assist our analysis: demographics (gender, age, region code type), Vehicles (Vehicle Age, Damage), Policy (Premium, sourcing channel) etc.
Insurance was a familiar field to our team and cross-selling is a widely used strategy in insurance market. Hence, we decided to pursue cross-sell analytics.
We were trying to find some effective ways to understand the cross selling in detail and we came across following links:
https://www.yieldify.com/blog/cross-selling/ https://www.podium.com/article/cross-selling/ https://www.business.com/articles/how-to-boost-sales-with-cross-selling-and-cross-promotion/
After studying the data set, we realized which attributes contribute to the response of the customer. Following that we formulated our SMART question and sub-SMART questions around response.
Our SMART questions didn’t change but we modified them a little to better project the impact of independent attributes on the “Response” of customer.
This report is organized as follows:
As mentioned previously, our dataset houses 381109 observations across 12 variables. (See below for a readout of the dataset’s structure and variable names.) Variable descriptions are as follows and come from the following link; astericks next to variable name indicates usage in our analysis
dataset
For our exploratory data analysis, we ignored “id” because this is a independent variable with no relation to customers “Response”.
We will be able to get a idea on the outlier here by the percentiles ( In the Annual_Premium the 3rd quartile is 39400 and the max is 540165 this represents the outlier in this column.
From the plot we can say that there’s imbalance in response. The individuals interested in purchasing a vehicle insurance are only 12.6%.
Variable Age looks like right skewed and the count is maximum for age 25. The Age is important because there is a difference in the medians between accepting and rejecting, as it is possible to observe in the Box-Plot. Older people are who acquire insurance in comparison with those who do not.
Male category is slightly greater than that of female and chances of buying the insurance is also little high
The distribution of customers with or without vehicle damage is almost same. The ones with vehicle damage are more interested in vehicle insurance.
Region Code 28 seems to have highest customers and also the highest customers interested in vehicle insurance.
99% of customers have driving license and customers interested in Vehicle Insurance have driving license
Customer who don’t have an insurance are higher in number than those who have insurance. Also they are more likely to buy the insurance.
Customer who own a vehicle for more than 2 years are not many but some of them are interested in getting vehicle insurance. Mostly customers with vehicle for 1-2 years are interested in vehicle insurance.
Looking at the box-plot, we can see that the medians are almost at the same level; that is why the variable is nos helpful because it does not discriminate between accepting and reject
With the Log we are making it smoother, so it is better for visualization. The BoxPlot is useful because it can show that that the mean is not at the same level, so there is discrimination between profiles. It is possible to infer that people with a greater Annual_Premium take the insurance.
A t-test is a type of inferential statistic used to determine if there is a significant difference between the means of two groups, which may be related in certain features.The t-test is one of many tests used for the purpose of hypothesis testing in statistics.Calculating a t-test requires three key data values. They include the difference between the mean values from each data set (called the mean difference), the standard deviation of each group, and the number of data values of each group.For t-Test, we split the original dataset “vehicle” into subsets of customers who “accepted” or “rejected” the insurance.
##
## One Sample t-test
##
## data: accepted$Age
## t = 759, df = 45154, p-value <2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 43.2 43.4
## sample estimates:
## mean of x
## 43.3
##
## One Sample t-test
##
## data: rejected$Age
## t = 1379, df = 3e+05, p-value <2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 38.0 38.1
## sample estimates:
## mean of x
## 38
##
## One Sample t-test
##
## data: accepted$Policy_Sales_Channel
## t = 352, df = 45154, p-value <2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 92.2 93.2
## sample estimates:
## mean of x
## 92.7
##
## One Sample t-test
##
## data: rejected$Policy_Sales_Channel
## t = 1237, df = 3e+05, p-value <2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 115 115
## sample estimates:
## mean of x
## 115
##
## One Sample t-test
##
## data: accepted$Vintage
## t = 391, df = 45154, p-value <2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 153 155
## sample estimates:
## mean of x
## 154
##
## One Sample t-test
##
## data: rejected$Vintage
## t = 1053, df = 3e+05, p-value <2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 154 155
## sample estimates:
## mean of x
## 154
From t-test we can conclude that p-value of all numerical variables for accepted and rejected sub-groups are less than alpha (0.05). Hence, the NULL Hypothesis can be rejected, i.e, the mean of accepted and rejected is not same as the mean of the population dataset
We use Chi-square (χ²) test for the categorical variables - Gender, driving License, Region Code, Previously Insured, Vehicle Age, Vehicle damage and Response to establish dependency. We have used “Test of Independence”. If the p-value is less than 0.05, which is our alpha, we can conclude that our variables are not independent, we fail to reject the null hypothesis and it is statistically significant for our model.
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: ct
## X-squared = 1014, df = 1, p-value <2e-16
## [1] "Alpha value is set as 0.05 and p -value from Pearson's test is: 1.54578483273509e-222"
## [1] "Gender is not independent of response"
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: ct
## X-squared = 34, df = 1, p-value = 6e-09
## [1] "Alpha value is set as 0.05 and p -value from Pearson's test is: 6.29491414277065e-09"
## [1] "Driving License is not independent of response"
##
## Pearson's Chi-squared test
##
## data: ct
## X-squared = 2958, df = 4, p-value <2e-16
## [1] "Alpha value is set as 0.05 and p -value from Pearson's test is: 0"
## [1] "Region Code is not independent of response"
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: ct
## X-squared = 43092, df = 1, p-value <2e-16
## [1] "Alpha value is set as 0.05 and p -value from Pearson's test is: 0"
## [1] "Previously Insured is not independent of response"
##
## Pearson's Chi-squared test
##
## data: ct
## X-squared = 18063, df = 2, p-value <2e-16
## [1] "Alpha value is set as 0.05 and p -value from Pearson's test is: 0"
## [1] "Vehicle Age is not independent of response"
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: ct
## X-squared = 46489, df = 1, p-value <2e-16
## [1] "Alpha value is set as 0.05 and p -value from Pearson's test is: 0"
## [1] "Vehicle Damage is not independent of response"
After looking at our hypothesis tests, we can conclude that - “NULL Hypothesis can be rejected”. This means that numerical attributes have a statistically significant w.r.t our dependent variable - Response - and needs to be analysed further. After the tests, we did correlation, to understand which variables are “more” significant in impacting “Response” and we can conclude that vehicle_damage, previously_insured and vehicle_age have high correlation.